Back to README

rm(list=ls())
# Load R package for printing
library(knitr)

Aim

To model and analyze point reference spatial datasets in R by using the R package gstat.


Reading material

Lecture notes and bibliography:

References for R, Rmarkdown, and LaTeX (optional, supplementary material):


New software

We will learn how to use the R package gstat:

gstat: Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation


Datasets involved

Meuse river data set

Jura data set


R package gstat

We study how to implement basic point reference modeling and analysis in R computing environment by using the R package gstat (Pebesma, 2004), in conjunction to the R packages sf and sp for handling spatial data objects.

The R package gstat (Pebesma, 2004) provides basic functionality for univariable and multivariable geostatistical modeling, prediction, simulation, and analysis. This includes:

1 Simulate a Random Field

R function grf {geoR} is generates (unconditional) simulations of Gaussian random fields for given covariance parameters.

Important input arguments:

for more details ?cov.spatial and ?grf.

The following R script simulates \(n=100\) random points. It uses an Exponential covariance function which is set by cov.model = “exponential”, with covariance functions parameters 1 for the partial sill and .25 for the range which is set by cov.pars=. It uses no nugget which is set by nugget = 0. That is

\[ c\left(h\right)=\sigma^{2}\exp\left(-\frac{1}{\phi}\left\Vert h\right\Vert _{1}\right) \]

rm(list=ls())
# simulate
library(geoR)
## --------------------------------------------------------------
##  Analysis of Geostatistical Data
##  For an Introduction to geoR go to http://www.leg.ufpr.br/geoR
##  geoR version 1.9-4 (built on 2024-02-14) is now loaded
## --------------------------------------------------------------
sim.grf <- grf(n = 100,              # number of points (spatial locations) in each simulations.
            cov.model = "exponential", # matern cov function family
            cov.pars = c(1, .25), # a vector with 2 elements with values of the covariance 
                                  # parameters \sigma^2(partial sill) and \phi (range parameter)
            nugget = 0.0,         # the value of the nugget effect parameter
            xlims = c(0, 1), 
            ylims = c(0, 1) 
            )
## Warning in grf(n = 100, cov.model = "exponential", cov.pars = c(1,
## 0.25), : .Random.seed not initialised. Creating it with by calling runif(1)
## grf: simulation(s) on randomly chosen locations with  100  points
## grf: process with  1  covariance structure(s)
## grf: nugget effect is: tausq= 0 
## grf: covariance model 1 is: exponential(sigmasq=1, phi=0.25)
## grf: decomposition algorithm used is:  cholesky 
## grf: End of simulation procedure. Number of realizations: 1
# create an sf object
sim.grf.df <- data.frame(x = sim.grf$coords[,1],
                         y = sim.grf$coords[,2],
                         value = sim.grf$data
                        )
library(sf)
## Linking to GEOS 3.12.2, GDAL 3.8.5, PROJ 9.3.1; sf_use_s2() is TRUE
sim.grf.sf <- st_as_sf( sim.grf.df, 
                        coords = c("x", "y")
)
# plot
library(ggplot2)
ggplot(data = sim.grf.sf) + 
  geom_sf(aes(color = value)) +
  theme_minimal() +
  labs(y= "y", x = "x", title ="GRF")

Task

Simulate and plot a GRF with

Note: At home you may have a look at Algorithm 1

#
# Write your code below.
#
rm(list=ls())
# simulate
library(geoR)
sim.grf <- grf(n = 500,              # number of points (spatial locations) in each simulations.
            cov.model = "gaussian", # matern cov function family
            cov.pars = c(2, 1.5), # a vector with 2 elements with values of the covariance 
                                  # parameters \sigma^2(partial sill) and \phi (range parameter)
            nugget = 0.0,         # the value of the nugget effect parameter
            xlims = c(0, 1)
            )
## grf: simulation(s) on randomly chosen locations with  500  points
## grf: process with  1  covariance structure(s)
## grf: nugget effect is: tausq= 0 
## grf: covariance model 1 is: gaussian(sigmasq=2, phi=1.5)
## grf: decomposition algorithm used is:  cholesky 
## trying another decomposition (svd)
## grf: End of simulation procedure. Number of realizations: 1
# create an sf object
sim.grf.df <- data.frame(x = sim.grf$coords[,1],
                         y = sim.grf$coords[,2],
                         value = sim.grf$data
                        )
library(sf)
sim.grf.sf <- st_as_sf( sim.grf.df, 
                        coords = c("x", "y")
)
# plot
library(ggplot2)
ggplot(data = sim.grf.sf) + 
  geom_sf(aes(color = value)) +
  theme_minimal() +
  labs(y= "y", x = "x", title ="GRF")


2 General info

In what follows we work on the geostatistical model specified as a stochastic process \(\left(Z\left(s\right);s\in\mathcal{S}\right)\) with

\[\begin{equation} Z\left(s\right)=\mu\left(s\right)+\delta\left(s\right)\label{eq:-39} \end{equation}\]

where \(\mu\left(s\right)\) is a deterministic linear expansion of known basis functions \(\left\{ \psi_{j}\left(\cdot\right)\right\} _{j=0}^{p}\) and unknown coefficients \(\left\{ \beta_{j}\right\} _{j=0}^{p}\) such as

\[\begin{align*} \mu\left(s\right) & =\sum_{j=0}^{p}\psi_{j}\left(s\right)\beta_{j}=\left(\psi\left(s\right)\right)^{\top}\beta \end{align*}\]

with \(\beta=\left(\beta_{0},...,\beta_{p}\right)^{\top}\) and \(\psi\left(s\right)=\left(\psi_{0}\left(s\right),...,\psi_{p}\left(s\right)\right)^{\top}\).

Also , \(\delta\left(s\right)\) is a zero mean intrinsic random field with semivariogram \(\gamma(\cdot)\).


3 Sample semivariogram

The R function variogram {gstat} calculates the sample semivariogram cloud and the sample semivariogram from data \(Z(\cdot)\).

In case a linear drift is given, it calculates the above for the residuals \(\delta(\cdot)=Z(\cdot)-\mu(\cdot)\), with options for isotropic / anysotropic semivariogram, and for irregular / regular distance intervals.

Important input arguments

data= data frame where the names in formula are to be found

Outputs

3.1 Semivariogram cloud

The following R script computes and plots the semivariogram cloud corresponding to \(\delta(\cdot)\). It

  1. loads the data set meuse{sp}

  2. computes the semivariogram cloud by considering a drift \(\mu(s)=\beta_0+\beta_1\sqrt{d(s)}\) where \(d(s)\) denotes the covariate “dist” from the dataset meuse{sp}.

  3. plots the semivariogram cloud

rm(list = ls())
library(sp)
library(sf)
# 1 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df, 
                  coords = c("x", "y"),      # specify which columns are the coordinates
                  crs = 28992  # CRS code
                  )
# 2 Compute the sample variogram cloud
library(gstat)
meuse.sv.cld <- variogram(object = log(zinc) ~ 1 + sqrt(dist), # formula defining the response vector and (possible) regressors
                 data = meuse.sf,                        # data frame
                 cloud = TRUE                         # if TRUE, it calculates the semivariogram cloud
                 )
# ...with some investigation
names(meuse.sv.cld)
## [1] "np"      "dist"    "gamma"   "dir.hor" "dir.ver" "id"
head(meuse.sv.cld)
# 3 Plot
plot(meuse.sv.cld,
      xlab = 'distance', 
      ylab = 'gamma', 
      main = 'semivariogram cloud'
)

# or
# plot(x = cld$dist,
#      y = cld$gamma,
#      cex = .5, 
#      pch = 3, 
#      col = 'blue', 
#      xlab = 'distance', 
#      ylab = 'gamma', 
#      main = 'semivariogram cloud'
# )

Task

Consider the dataset in object jura.pred provided by jura {gstat} . The quantity of interest (response) is Cd representing mg cadmium \(kg^{-1}\) topsoil but in log-scale. Consider a constant deterministic drift function \(\mu(.)=\beta_0\) where \(\beta_0\) is unknown. Compute and plot the the semivariogram cloud.

rm(list=ls())
# Load the data
library(sp)
library(sf)
library(gstat)
data(jura)
jura.pred.df <- as.data.frame(jura.pred)
jura.pred.sf <- st_as_sf(jura.pred.df, 
                         coords = c("long", "lat"),      # specify which columns are the coordinates
                         crs = 4326  # CRS code
)
#
# Write your code below
#
# 2 Compute the sample variogram cloud
library(gstat)
jura.pred.sv.cld <- variogram(object = log(Cd) ~ 1, # formula defining the response vector and (possible) regressors
                 data = jura.pred.sf,                        # data frame
                 cloud = TRUE                         # if TRUE, it calculates the semivariogram cloud
                 )
# ...with some investigation
names(jura.pred.sv.cld)
## [1] "np"      "dist"    "gamma"   "dir.hor" "dir.ver" "id"
head(jura.pred.sv.cld)
# 3 Plot
plot(jura.pred.sv.cld,
      xlab = 'distance', 
      ylab = 'gamma', 
      main = 'semivariogram cloud'
)

3.2 Isotropic sample semivariogram (non-parametric estimate)

The following R script computes and plots the non-parametric Matheron’s semi-variogram estimate corresponding to \(\delta(\cdot)\).

For this, function variogram {gstat} uses the input argument cloud=FALSE. It

  1. loads the data set meuse{sp}

  2. computes the sample semivariogram by considering a drift \(\mu(s)=\beta_0+\beta_1\sqrt{d(s)}\) where \(d(s)\) denotes the covariate “dist” from the dataset meuse{sp}.

  3. plots the semivariogram cloud

rm(list = ls())
library(sp)
library(sf)
# 1 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df, 
                  coords = c("x", "y"),      # specify which columns are the coordinates
                  crs = 28992  # CRS code
                  )

svgm <- variogram(object = log(zinc) ~ 1 + sqrt(dist), # formula defining the response vector and (possible) regressors
                  data = meuse.sf,                        # data frame
                  cloud=FALSE
                  )
#
summary(svgm)
##        np             dist             gamma           dir.hor     dir.ver 
##  Min.   : 57.0   Min.   :  79.29   Min.   :0.0882   Min.   :0   Min.   :0  
##  1st Qu.:435.5   1st Qu.: 425.61   1st Qu.:0.1693   1st Qu.:0   1st Qu.:0  
##  Median :477.0   Median : 796.18   Median :0.1930   Median :0   Median :0  
##  Mean   :458.9   Mean   : 799.98   Mean   :0.1918   Mean   :0   Mean   :0  
##  3rd Qu.:545.0   3rd Qu.:1169.60   3rd Qu.:0.2315   3rd Qu.:0   3rd Qu.:0  
##  Max.   :589.0   Max.   :1543.20   Max.   :0.2550   Max.   :0   Max.   :0  
##     id    
##  var1:15  
##           
##           
##           
##           
## 
#
plot(x = svgm$dist,
     y = svgm$gamma,
     cex = .5,
     pch = 3,
     col = 'blue',
     xlab = 'distance',
     ylab = 'gamma',
     main = 'sample variogram'
     )
#
text(svgm$dist, 
     svgm$gamma, 
     svgm$np, 
     adj = c(0,2)
     )

3.3 Anysotropic sample semivariogram (non-parametric estimate)

Function variogram {gstat} computes the non-parametric Matheron’s semi-variograme estimate for specific angles by using the additional input argument

  • angle= indicating the direction in plane (x,y), in positive degrees clockwise from positive y (North): alpha=0 for direction North (increasing y), alpha=90 for direction East (increasing x); optional a vector of directions in (x,y).

Furthermore, function variogram {gstat} allows the use of other input arguments such as

  • cutoff indicating spatial separation distance up to which point pairs are included in semivariance estimates

  • width indicating the width of subsequent distance intervals into which data point pairs are grouped for semivariance estimates

The following script computes the semi-variogram estimate for

  • angle equal to \(45\) degrees,

  • cutoff distance equal to \(1000\), and

  • width equal to \(50\).

rm(list = ls())
library(sp)
library(sf)
# 1 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df, 
                  coords = c("x", "y"),      # specify which columns are the coordinates
                  crs = 28992  # CRS code
                  )
# 2 Compute the semivariogram estimate
svgm.tmp <- variogram(object = log(zinc) ~ 1 + sqrt(dist), # formula defining the response vector and (possible) regressors
                  data = meuse.sf,               # data frame
                  alpha = c(45),              # direction in plane (x,y),
                  cutoff = 1000,              # spatial separation distance up to which point pairs are included in semivariance estimates
                  width = 50                  # the width of subsequent distance intervals into which data point pairs are grouped for semivariance estimates
                  )
# plot
plot( svgm.tmp )

Task

Consider the dataset in object jura.pred provided by jura {gstat} . The quantity of interest (response) is Cd representing mg cadmium \(kg^{-1}\) topsoil.

Consider a constant deterministic drift function \(\mu(.)=\beta_0\) where \(\beta_0\) is unknown.

Compute and plot the the sample semivariogram cloud with angle equal to \(0\), \(45\), \(60\), and \(90\) degrees, cutoff distance equal to \(1.5\), and width equal to \(0.1\)

rm(list=ls())
# Load the data
library(sp)
library(sf)
library(gstat)
data(jura)
jura.pred.df <- as.data.frame(jura.pred)
jura.pred.sf <- st_as_sf(jura.pred.df, 
                         coords = c("long", "lat"),      # specify which columns are the coordinates
                         crs = 4326  # CRS code
)
#
# Write your code below
#
# 2 Compute the semivariogram estimate
svgm.tmp <- variogram( object = log(Cd) ~ 1 , # formula defining the response vector and (possible) regressors
                  data = jura.pred.sf,                # data frame
                  alpha = c(0,45,60,90),
                  cutoff = 1.5,
                  width = 0.1
)
plot( svgm.tmp )


4 Specification and fit of the parametric semivariogram model

This is tricky a bit, the main steps are as follows

  1. compute a sample semigariogram given the data by using variogram{gstats} as previously

  2. specify the parametric form of the semivariogram model by using vgm{gstats}

  3. calibrate / estimate / fit the specified parametric semivariogram against the computed sample semivariogram by using fit.variogram{gstats}

4.1 Step 2: Specification of the (parametric) semivariogram model

The function vgm{gstats} specifies / generates a semivariogram model, or adds a semivariogram model to an existing model with purpose to create a mode general one.

Important input arguments of vgm{gstats} are:

  • psill (partial) sill of the variogram model component, or model

  • model semivariogram model type, (e.g. “Exp”, “Sph”, “Gau”, or “Mat”).

    • Can be a character vector of model types combined with c(), e.g. c(“Exp”, “Sph”), in which case the best fitting is returned.
  • range range parameter of the variogram model component; in case of anisotropy: major range / direction of the ellipsoid (in 2D case)

  • nugget nugget component of the variogram (this basically adds a nugget compontent to the model); if missing, nugget component is omitted

  • add.to the variogram model to which we want to add a component (structure)

  • cutoff maximum distance up to which variogram values are computed

  • kappa smoothness parameter for the Matern class of variogram models

Here is a visual

The basic parametric semivariogram models available in vgm{gstats} are

Examples of the basic parametric semivariogram models using specific parametric values are:

The following script specifies parametric semivariogram model resulting by adding several standard semivariogram models:

\[ \gamma\left(h\right)=\gamma_{\text{nug}}\left(h;\sigma^{2}=0.5\right)+\gamma_{\text{sph}}\left(h;\sigma^{2}=1,\phi=300\right)+\gamma_{\text{sph}}\left(h;\sigma^{2}=0.8,\phi=800\right) \]

where \(\gamma_{\text{nug}}\left(h;\sigma^{2}\right)=\sigma^{2}\text{1}_{\{0\}}\left(\left\Vert h\right\Vert \right)\) is the nugget semivariogram model.

rm(list = ls())
# 1 Specify the semivariogram model  
vgm.tmp <- vgm(psill = 0.5,
          model = "Nug", 
          range = 0
          )
vgm.tmp <- vgm(psill = 1,
          model = "Sph", 
          range = 300,
          add.to = vgm.tmp
          )
vgm.tmp <- vgm(psill = 0.8, 
          model = "Sph", 
          range = 800, 
          add.to = vgm.tmp
          )
# 2 print the values of the parameters of the specified semivariogram model  
print(vgm.tmp)
##   model psill range
## 1   Nug   0.5     0
## 2   Sph   1.0   300
## 3   Sph   0.8   800

4.2 Step 3: Fit the parametric semivariogram model against the sample semivariogram

The function fit.variogram {gstat} fits ranges and/or sills from a simple or nested variogram model to a sample variogram.

Important input arguments:

  • object= gets the sample variogram, output of variogram

  • model= variogram model, output of vgm

The following script fits the semivariogram model

\[ \gamma\left(h\right) =\gamma_{\text{nug}}\left(h;\sigma_{\text{nug}}^{2} \right) +\gamma_{\text{sph}}\left(h;\sigma_{\text{sph}}^{2},\phi_{\text{sph}}\right) \]

(aka estimates \(\sigma_{\text{nug}}^{2}\), \(\sigma_{\text{sph}}^{2}\), and \(\phi_{\text{sph}}\)) against the sample variogram via weighted least squares. Then it prints the result.

Note that the values in the input arguments related to the semivariogram parameters psill = 0.5, psill = 1, and range = 300 are just initial seed values for the numerical solver used.

rm(list = ls())
library(sp)
library(sf)
# 0 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df, 
                  coords = c("x", "y"),      # specify which columns are the coordinates
                  crs = 28992  # CRS code
                  )
# 1 Specify the semi variogram model
smpl.vgm <- variogram(object = log(zinc) ~ 1 + sqrt(dist), # formula associated to the trend
                      data = meuse.sf,           # data frame
                      )
vgm.obj <- vgm(psill = 0.5,
                model = "Nug", 
                range = 0
                )
vgm.obj <- vgm(psill = 1,                   # the value "=1" is just an initial value
                 model = "Sph",
                 range = 300,
                 add.to = vgm.obj
                )
# 2 Fit the semi variogram model against the sample variogram
est.vgm <- fit.variogram(object = smpl.vgm, # sample variogram, output of variogram
                         model = vgm.obj    # variogram model, output of vgm
                         )
# 3 Print the estimated values for the parameters of the semi variogram model   
print(est.vgm)
##   model      psill   range
## 1   Nug 0.07981775   0.000
## 2   Sph 0.14905741 872.719

Task

Consider the dataset in object jura.pred provided by jura {gstat} . The quantity of interest (response) is Cd representing mg cadmium \(kg^{-1}\) topsoil.

Consider a constant deterministic drift function \(\mu(.)=\beta_0\) where \(\beta_0\) is unknown.

Fit the parametric family of semivariograms

\[ \gamma\left(h\right)=\gamma_{\text{nug}}\left(h;\sigma_{\text{nug}}^{2}\right)+\gamma_{\text{Exp}}\left(h;\sigma_{\text{Exp}}^{2},\phi_{\text{Exp}}\right) \]

where \(\gamma_{\text{nug}}\left(h;\sigma^{2}\right)=\sigma^{2}\text{1}_{\{0\}}\left(\left\Vert h\right\Vert \right)\) is the nugget semivariogram model. Use suitable initial values for \(\sigma_{\text{nug}}^{2}\), \(\sigma_{\text{Exp}}^{2}\), \(\phi_{\text{Exp}}\)

Consider the sample semivariogram with cutoff distance equal to \(1.5\), and width equal to \(0.1\). Also assume isotropy.

Print the semivariogram plot.

rm(list=ls())
# Load the data
library(sp)
library(sf)
library(gstat)
data(jura)
jura.pred.df <- as.data.frame(jura.pred)
jura.pred.sf <- st_as_sf(jura.pred.df, 
                         coords = c("long", "lat"),      # specify which columns are the coordinates
                         crs = 4326  # CRS code
)
#
# Write your code below
#
# Step 1
smpl.vgm <- variogram(object = log(Cd) ~ 1 , # formula defining the response vector and (possible) regressors
                  data = jura.pred.sf,            # data frame
                  cutoff = 1.5,              # spatial separation distance up to which point pairs are included in semivariance estimates
                  width = 0.1                  # the width of subsequent distance intervals into which data point pairs are grouped for semivariance estimates
                  )
# Step 2
vgm.model <- vgm(psill = 0.1,
          model = "Nug", 
          range = 0
          )
vgm.model <- vgm(psill = 0.5,
          model = "Exp",
          range = 1.5,
          add.to = vgm.model
          )
print(vgm.model)
##   model psill range
## 1   Nug   0.1   0.0
## 2   Exp   0.5   1.5
# Step 3
est.vgm.model <- fit.variogram(object = smpl.vgm, # sample variogram, output of variogram
                         model = vgm.model    # variogram model, output of vgm
                         )
#
print(est.vgm.model)
##   model    psill      range
## 1   Nug 0.000000 0.00000000
## 2   Exp 0.437191 0.08009554
#
plot(est.vgm.model, 
     cutoff = 1.9)


5 Spatial prediction by Kriging.

The main steps for Universal and simple Kriging are the following

  1. Specify the semivariogram model and fit it against the sample semivariogram estimate by using vgm{gstats}, variogram{gstats}, and fit.variogram{gstats} as previously.

  2. Train the geostatistical model by using gstat {gstat}

  1. Compute predictive mean and variance values at unseen points by using predict {gstat}

Function gstat{gstat} (has similar structure to lm{stats}) produces objects that hold all the information necessary for univariate geostatistical prediction (simple, ordinary or universal kriging).

Important input arguments:

5.1 Train the geostatistical model

The following script trains the geostatistical model

\[ Z\left(s\right)=\mu\left(s\right)+\delta\left(s\right) \]

against the meuse{sp} dataset.

The deterministic trend is

\[ \mu\left(s\right)=\beta_{0}+\beta_{1}\underset{=\psi_{1}\left(s\right)}{\underbrace{\sqrt{D\left(s\right)}}} \]

where \(D\left(s\right)\) denotes the variable meuse$distance, \(\beta\) is unknown, and \(\delta\left(s\right)\) is an intrinsic stationary process with variogram \[ \gamma\left(h\right) =\gamma_{\text{nug}}\left(h;\sigma_{\text{nug}}^{2} \right) +\gamma_{\text{sph}}\left(h;\sigma_{\text{sph}}^{2},\phi_{\text{sph}}\right) \]

rm(list = ls())
library(sp)
library(sf)
# 1 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df, 
                  coords = c("x", "y"),      # specify which columns are the coordinates
                  crs = 28992  # CRS code
                  )
# 2 Compute the semivariogram estimate
svgm.tmp <- variogram(object = log(zinc) ~ 1 + sqrt(dist), # formula defining the response vector and (possible) regressors
                  data = meuse.sf,               # data frame
                  alpha = c(45),              # direction in plane (x,y),
                  cutoff = 1000,              # spatial separation distance up to which point pairs are included in semivariance estimates
                  width = 50                  # the width of subsequent distance intervals into which data point pairs are grouped for semivariance estimates
                  )

# set the formula of the ternd  
#
frml <- log(zinc) ~ 1 + sqrt(dist)
#
# Estimate the variogram  
#
smpl.vgm <- variogram(object = frml, 
               data = meuse.sf)
#
par.vgm.0 <- vgm(psill = 0.5,
                model = "Nug", 
                range = 0
                )
par.vgm.1 <- vgm(psill = 1,
                 model = "Sph",
                 range = 300,
                 add.to = par.vgm.0
                )
par.vgm.est <- fit.variogram(object = smpl.vgm, 
                         model = par.vgm.1
                         )
#
# compute the Kriging equations
# 
uni.krige.est <- gstat( formula = frml,
                    data = meuse.sf, 
                    model = par.vgm.est
                )
#
#
# summary
#
summary(uni.krige.est)
##       Length Class  Mode
## data  1      -none- list
## model 1      -none- list
## call  4      -none- call

5.2 Compute predictive mean and variance

R function predict {gstat} provides prediction methods for simple, ordinary, and universal kriging, point- or block-kriging.

Important input arguments

  • object object of class gstat

  • newdata data frame with prediction/simulation locations; should contain columns with the independent variables (if present) and the coordinates with names as defined in the data argument of used for training with gstat.

The following script

  1. computes the predictive mean and variance for the geostatistical model at unseen locations included in the data.frame meuse.grid {sp} Note that we have to convert the data.frame meuse.grid to sf object having coordinates and covariates with the same namce as the meuse data we used for training.

  2. prints the predictions

  3. plots the variance of the predictions

#rm(list=ls())
# 0 Get the data => from previous script
# 1 Train the geostatistical model => from the previous script
# 2.1 Get the unseen locations  
data(meuse.grid)
meuse.grid.df <- meuse.grid
meuse.grid.sf <- st_as_sf(meuse.grid.df, 
                  coords = c("x", "y"),      # specify which columns are the coordinates
                  crs = 28992  # CRS code
                  )
# 2.1 Compute the predictions at the unseen locations
uni.krige.prd <- predict(object = uni.krige.est, # object of class gstat; output of gstat()
                     newdata = meuse.grid.sf   # data frame with prediction locations
                     )
## [using universal kriging]
# 2.2 Print summaries
print(uni.krige.prd)
## Simple feature collection with 3103 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 178460 ymin: 329620 xmax: 181540 ymax: 333740
## Projected CRS: Amersfoort / RD New
## First 10 features:
##    var1.pred  var1.var              geometry
## 1   7.071055 0.1683803 POINT (181180 333740)
## 2   7.088269 0.1518497 POINT (181140 333700)
## 3   6.785167 0.1537665 POINT (181180 333700)
## 4   6.512937 0.1576136 POINT (181220 333700)
## 5   7.104999 0.1343482 POINT (181100 333660)
## 6   6.804171 0.1368076 POINT (181140 333660)
## 7   6.572142 0.1414102 POINT (181180 333660)
## 8   6.420664 0.1469311 POINT (181220 333660)
## 9   7.023982 0.1177907 POINT (181060 333620)
## 10  6.820917 0.1196876 POINT (181100 333620)
summary(uni.krige.prd)
##    var1.pred        var1.var               geometry   
##  Min.   :4.455   Min.   :0.1009   POINT        :3103  
##  1st Qu.:5.212   1st Qu.:0.1160   epsg:28992   :   0  
##  Median :5.590   Median :0.1231   +proj=ster...:   0  
##  Mean   :5.702   Mean   :0.1299                       
##  3rd Qu.:6.134   3rd Qu.:0.1376                       
##  Max.   :7.477   Max.   :0.2112
# 2.4 Plot predictions
library(mapview)
mapview(uni.krige.prd, 
        zcol= "var1.pred", 
        layer.name = "predicted mean"
        )
mapview(uni.krige.prd, 
        zcol= "var1.var", 
        layer.name = "predicted varaince"
        )

Task (Simple Kriging)

Consider the dataset in object jura.pred provided by jura {gstat} . The quantity of interest (response) is Cd representing mg cadmium \(kg^{-1}\) topsoil.

Consider the same above geostatistical model with difference that

  • the deteministic trend is known and equal to

    \[ \mu\left(s\right)=\beta_{0}=0.1 \]

    where \(\beta_{0}=1\) is a unknown constant –this is simple Kriging.

  • the semivariogram is parameterised as

    \[ \gamma\left(h\right)=\gamma_{\text{nug}}\left(h;\sigma_{\text{nug}}^{2}\right)+\gamma_{\text{Exp}}\left(h;\sigma_{\text{Exp}}^{2},\phi_{\text{Exp}}\right) \]

Compute the predictive mean and variance at the unseen locations provided by the dataset in the obj jura.grid in jura {gstat} .

rm(list=ls())
# Load the data
library(sp)
library(sf)
library(gstat)
data(jura)
jura.pred.df <- as.data.frame(jura.pred)
jura.pred.sf <- st_as_sf(jura.pred.df, 
                         coords = c("long", "lat"),      # specify which columns are the coordinates
                         crs = 4326  # CRS code
)
jura.grid.df <- as.data.frame(jura.grid)
##################
#
# Write your code  
#
##################
jura.grid.sf <- st_as_sf(jura.grid.df, 
                         coords = c("long", "lat"),      # specify which columns are the coordinates
                         crs = 4326  # CRS code
)
#
# set the formula of the ternd  
#
frml <- log(Cd) ~ 1
#
# Estimate the variogram  
#
smpl.vgm <- variogram(object = frml, 
               data = jura.pred.sf)
#
par.vgm.model <- vgm(psill = 0.1,
                model = "Nug", 
                range = 0.0
                )
par.vgm.model <- vgm(psill = 0.5,
                 model = "Exp", 
                 range = 1.5,
                 add.to = par.vgm.model
)
par.vgm.est <- fit.variogram(object = smpl.vgm, 
                         model = par.vgm.model
                         )
#
# compute the Kriging equations
# 
sim.krige.est <- gstat( formula = frml,
                    data = jura.pred.sf, 
                    model = par.vgm.est,
                    beta = c(0.1) #for simple kriging, vector with the trend coefficients (including intercept)  
                )
#
# summary
#
summary(sim.krige.est)
##       Length Class  Mode
## data  1      -none- list
## model 1      -none- list
## call  5      -none- call
#
krige.prd <- predict(object = sim.krige.est,     # object of class gstat; output of gstat()
                     newdata = jura.grid.sf # data frame with prediction locations
                     )
## [using simple kriging]
mapview(krige.prd, 
        zcol= "var1.pred", 
        layer.name = "predicted mean"
        )
mapview(krige.prd, 
        zcol= "var1.var", 
        layer.name = "predicted varaince"
        )

6 Cross validation

Cross validation splits the dataset into two sets: a modeling set and a validation set.

The modeling set is used for semivariogram modelling and kriging on the locations of the validation set, and then the validation measurements can be compared to their predictions.

If all went well, cross validation residuals or z-scores computed as \[ z_i = \frac { Z(s_i)-Z_{[i]}(s_i) } { \sigma_{[i]}(s_i) }, i=1,...,n \]

with \(Z_{[i]}(s_i)\) denoting the cross validation prediction for \(s_i\) and \(\sigma_{[i]}(s_i)\) denoting the corresponding kriging standard error are small, have zero mean, and no apparent structure.

Function gstat.cv{gstat} performs cross validation for simple, ordinary or universal point (co)kriging.

Important arguments

The following script

  1. trains a geostatistical model and saves the trainign output in krige.fit
  1. calculates the z-scores of a 5-fold cross validation
  1. plots the z-scores
  1. I observe that at some locations the z-scores are way too large, e.g. larger than +/-3. Perhaps the fitting is not the best, and the model has to be reconsidered by using different parametric semivariogram or by using different different trend or by including additional covariates.
rm(list = ls())
library(sp)
library(sf)
# 0 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df, 
                  coords = c("x", "y"),      # specify which columns are the coordinates
                  crs = 28992  # CRS code
                  )
# 1 train the geostatistical model 
# set the formula of the ternd  
frml <- log(zinc) ~ 1 + sqrt(dist)
# Estimate the variogram  
smpl.vgm <- variogram(object = frml, 
               data = meuse.sf)
#
par.vgm.0 <- vgm(psill = 0.5,
                model = "Nug", 
                range = 0
                )
par.vgm.1 <- vgm(psill = 1,
                 model = "Sph",
                 range = 300,
                 add.to = par.vgm.0
                )
par.vgm.est <- fit.variogram(object = smpl.vgm, 
                         model = par.vgm.1
                         )
# Train  the model
krige.fit <- gstat( formula = frml,
                    data = meuse.sf, 
                    model = par.vgm.est
                )
# 2 Perform n-fold Cross Validation
nf.cv = gstat.cv(krige.fit,    # object of class gstat
                    nfold = 5     
                 )
## [using universal kriging]
## [using universal kriging]
## [using universal kriging]
## [using universal kriging]
## [using universal kriging]
# 3 Plot the z-scores
bubble(obj = nf.cv["zscore"])

Task

Continuing your previous task, where you trained the geostatistical model with difference that

against the dataset in object jura.pred provided by jura {gstat} . The quantity of interest (response) is Cd representing mg cadmium \(kg^{-1}\) topsoil.

  1. Perform a 2-fold cross validation and plot the z-scores.

  2. Is the fitting satisfactory?

rm(list=ls())
# Load the data
library(sp)
library(sf)
library(gstat)
data(jura)
jura.pred.df <- as.data.frame(jura.pred)
jura.pred.sf <- st_as_sf(jura.pred.df, 
                         coords = c("long", "lat"),      # specify which columns are the coordinates
                         crs = 4326  # CRS code
)
jura.grid.df <- as.data.frame(jura.grid)
jura.grid.sf <- st_as_sf(jura.grid.df, 
                         coords = c("long", "lat"),      # specify which columns are the coordinates
                         crs = 4326  # CRS code
)
#
# set the formula of the ternd  
#
frml <- log(Cd) ~ 1
#
# Estimate the variogram  
#
smpl.vgm <- variogram(object = frml, 
               data = jura.pred.sf)
#
par.vgm.model <- vgm(psill = 0.1,
                model = "Nug", 
                range = 0.0
                )
par.vgm.model <- vgm(psill = 0.5,
                 model = "Exp", 
                 range = 1.5,
                 add.to = par.vgm.model
)
par.vgm.est <- fit.variogram(object = smpl.vgm, 
                         model = par.vgm.model
                         )
#
# compute the Kriging equations
# 
sim.krige.est <- gstat( formula = frml,
                    data = jura.pred.sf, 
                    model = par.vgm.est,
                    beta = c(0.1) #for simple kriging, vector with the trend coefficients (including intercept)  
                )

##################
#
# Write your code  
#
##################
# 2 Perform n-fold Cross Validation
nf.cv = gstat.cv(sim.krige.est,    # object of class gstat
                    nfold = 2     
                 )
## [using simple kriging]
## [using simple kriging]
# 3 Plot the z-scores
bubble(obj = nf.cv["zscore"])

#
# 4. 
# I observe that at some locations the z-scores are way too large, e.g. larger than +/-3. Perhaps the fitting is not the best, and the model has to be reconsidered by using different parametric semivariogram or by using different  different trend or by including additional covariates.  

7 Change of support (Block Kriging)

Function predict {gstat} can also perform block kriging prediction.

Additional important input arguments:

Some examples:

The following script computes and prints block Kriging prediction (mean and variance) for blocks of size \(50 \times 50\) given the spatial locations meuse.grid {sp}

rm(list = ls())
library(sp)
library(sf)
# 1 Get the data as sf object
data(meuse)
meuse.df <- meuse
meuse.sf <- st_as_sf(meuse.df, 
                  coords = c("x", "y"),      # specify which columns are the coordinates
                  crs = 28992  # CRS code
                  )
#
frml <- log(zinc) ~ 1 + sqrt(dist)
#
smpl.vgm <- variogram(object = frml, 
               data = meuse.sf)
#
par.vgm.0 <- vgm(psill = 0.5,
                model = "Nug", 
                range = 0
                )
par.vgm.1 <- vgm(psill = 1,
                 model = "Sph",
                 range = 300,
                 add.to = par.vgm.0
                )
par.vgm.est <- fit.variogram(object = smpl.vgm, 
                         model = par.vgm.1
                         )
#
krige.est <- gstat( formula = frml,
                    locations = meuse.sf, 
                    model = par.vgm.est
                )
data(meuse.grid)
meuse.grid.df <- meuse.grid
meuse.grid.sf <- st_as_sf(meuse.grid.df, 
                  coords = c("x", "y"),      # specify which columns are the coordinates
                  crs = 28992  # CRS code
                  )
#
krige.prd <- predict(object = krige.est, # object of class gstat; output of gstat()
                     newdata = meuse.grid.sf, # data frame with prediction locations
                     block = c(50, 50)
                     )
## [using universal kriging]
#
mapview(krige.prd, 
        zcol= "var1.pred", 
        layer.name = "predicted mean"
        )
mapview(krige.prd, 
        zcol= "var1.var", 
        layer.name = "predicted varaince"
        )

Task

Compute and plot the block Kriging prediction (mean and variance) for blocks of circular shape with radius \(20\), centered on the points of meuse.grid {sp}

#######################
# Write your code below 
#######################
xy <- expand.grid(x = seq(-20, 20, 4), y = seq(-20, 20,4))
xy <- xy[(xy$x^2 + xy$y^2) <= 20^2, ]
krige.prd <- predict(object = krige.est,      # object of class gstat; output of gstat()
                     newdata = meuse.grid.sf, # data frame with prediction locations
                     block = xy
                     )
## [using universal kriging]
#
mapview(krige.prd, 
        zcol= "var1.pred", 
        layer.name = "predicted mean"
        )
mapview(krige.prd, 
        zcol= "var1.var", 
        layer.name = "predicted varaince"
        )